Greedy Sharing: Load Balancing on Weakly Consistent Memory

نویسندگان

  • Umut A. Acar
  • Arthur Charguéraud
  • Mike Rainey
چکیده

An efficient online scheduler is crucial for balancing irregular parallel computations in a multiprocessor system. Over the last two decades, variants of the work-stealing scheduler have emerged as a popular choice for hardware shared-memory systems. The state-of-the-art work-stealing algorithms can guarantee near-optimal asymptotic complexity by relying on simple yet powerful techniques to balance total load among processors. Implementations of work stealing algorithms, however, continue to rely on synchronization operations, such as atomic read-write operations (e.g., compare and swap), to guarantee correctness of concurrent accesses to shared task pools. Furthermore, since work-stealing algorithms are traditionally designed by assuming a sequentially-consistent memory model, their implementations use additional memory fences on modern multiprocessor machines. Memory fences and atomic-read-write operations are known to be expensive in general, especially because they can require exclusive access to shared resources, such as the memory. In this paper, we present the greedy-sharing algorithm for load balancing on weakly-consistent hardware shared-memory architectures, such as modern multicore computers. Greedy sharing combines ideas from work-sharing and work-stealing algorithms to eliminate all synchronization operations. As in work sharing, busy processors perform load balancing by sharing their work; as in work-stealing, tasks migrate only from busy to idle processors. In greedy sharing, data races can occur when multiple processors target the same idle processor. To recover safely from such data races, we design a protocol for task sharing that makes use of particular time-stamping technique, which is attributed to Lamport. We present a specification of the algorithm, prove its correctness on the X86-TSO weak memory model, and prove an upper bound for its execution time. We have implemented the algorithm as part of our C++ library for parallel programming. We present experiments to show that the algorithm is practical.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysing the Impact of Heterogeneity with Greedy Resource Allocation Algorithms for Dynamic Load Balancing in Heterogeneous Distributed Computing System

Heterogeneous Distributed systems have been an active research area in computer science for the last two decade, task allocation and load balancing have been a major issue associated with such systems. The load-balancing problem, attempts to compute the assignment with smallest possible makespan (i. e. the completion time at the maximum loaded computing node). This paper presents and discusses ...

متن کامل

Efficient Massive Sharing of Content among Peers

In this paper we focus on the design of high performance peer-to-peer content sharing systems. In particular, our goal is to achieve global load balancing and short user-request response times. This is a formidable challenge, given the requirement to respect the autonomy of peers, their heterogeneity in terms of processing and storage capacities, their different content contributions, the huge ...

متن کامل

Flexible load balancing software for parallel applications in a time-sharing environment

Networks of workstations become more and more appropriate for parallel applications, as modern network technology enables high quality communication between powerful workstations. In this perspective, load balancing software must be extremely exible as the set of available nodes for a particular distributed memory application may change at run time. XENOOPS is an advanced environment for parall...

متن کامل

A framework for scalable greedy coloring on distributed-memory parallel computers

We present a scalable framework for parallelizing greedy graph coloring algorithms on distributed-memory computers. The framework unifies several existing algorithms and blends a variety of techniques for creating or facilitating concurrency. The latter techniques include exploiting features of the initial data distribution, the use of speculative coloring and randomization, and a BSP-style org...

متن کامل

Atomic Read-Modify-Write Operations are Unnecessary for Shared-Memory Work Stealing

We present a work-stealing algorithm for total-store memory architectures, such as Intel’s X86, that does not rely on atomic readmodify-write instructions such as compare-and-swap. In our algorithm, processors communicate solely by reading from and writing (non-atomically) into weakly consistent memory. We also show that join resolution, an important problem in scheduling parallel programs, can...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012